install.packages(c("tidyverse",
"here",
"tidylog",
"summarytools")) 13/01/2025
https://rstats-courses.github.io/CursoR-AEET-2025/materiales.html
La exploración de datos nos permite verificar su calidad, generar y probar hipótesis de forma rápida, identificando pistas prometedoras para analizar más a fondo luego.
La visualización de los datos es un buen comienzo, pero por sí sola no suele ser suficiente, ya que a menudo requiere transformar los datos previamente.
[1] "broom" "conflicted" "cli" "dbplyr"
[5] "dplyr" "dtplyr" "forcats" "ggplot2"
[9] "googledrive" "googlesheets4" "haven" "hms"
[13] "httr" "jsonlite" "lubridate" "magrittr"
[17] "modelr" "pillar" "purrr" "ragg"
[21] "readr" "readxl" "reprex" "rlang"
[25] "rstudioapi" "rvest" "stringr" "tibble"
[29] "tidyr" "xml2" "tidyverse"
Da información de las operaciones que se realizan en el dataset
Permite hacer resumenes completos de los datasets
read.table(), read.csv(), readRDS(), read.txt()
Argumentos útiles: sep, dec, comment.char, na.strings, stringsAsFactors
read_csv(), read_csv2(), read_table(), read_delim()
Más rapido, produce “tibbles”, no convierte characteres a factors automaticamente, no usa los nombres de fila.
Argumentos útiles: delim, comment, na, col_types, skip_empty_rows, guess_max
read_excel(), read_xls(), read_xlsx()
Argumentos útiles: sheet, col_types, skip
La función here() permite hacer referencia siempre al directorio donde se encuentra el proyecto
Ejemplo usando ruta absoluta:
Ejemplo usando ruta relativa al proyecto:
Mecanismo para encadenar funciones:
Rows: 213062 Columns: 14
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (8): site_name, megaplot, plot, plant_ID, species_name, height_diameter_...
dbl (6): trap, year, count, stem_diameter_cm, trap_area_m2, burned
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 213,062
Columns: 14
$ site_name <chr> "AND", "AND", "AND", "AND", "AND", "AND", "AND",…
$ megaplot <chr> "Bare Mountain", "Bare Mountain", "Bare Mountain…
$ plot <chr> "CNCT_01", "CNCT_01", "CNCT_01", "CNCT_01", "CNC…
$ trap <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ plant_ID <chr> "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01ABAM1", …
$ species_name <chr> "Abies_amabilis", "Abies_amabilis", "Abies_amabi…
$ year <dbl> 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, …
$ count <dbl> 22, 0, 0, 2, 0, 2, 108, 0, 0, 7, 0, 0, 2, 0, 12,…
$ stem_diameter_cm <dbl> 56.6, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA…
$ trap_area_m2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ height_diameter_taken <chr> "Breast Height", "Breast Height", "Breast Height…
$ burned <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, …
$ general_method <chr> "PARTIALCONECOUNT", "PARTIALCONECOUNT", "PARTIAL…
$ methods_notes <chr> "all cones visible from established view points …
# A tibble: 6 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1962 22
2 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1963 0
3 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1964 0
4 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1965 2
5 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1966 0
6 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1967 2
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
arrange() - Ordenar variables por casosrename() - Renombrar variablesrelocate() - Reordenar variablesselect() - Extraer variables# A tibble: 213,062 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1963 0
2 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1964 0
3 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1966 0
4 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1969 0
5 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1970 0
6 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1972 0
7 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1973 0
8 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1975 0
9 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1977 0
10 AND Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 1981 0
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
De mayor a menor:
# A tibble: 213,062 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 LUQ 1 1 92 <NA> Cecropia_schreberiana 1997 1114340
2 LUQ 1 1 93 <NA> Ficus_trigonata 2015 106650
3 LUQ 1 1 92 <NA> Ficus_trigonata 2013 69450
4 LUQ 1 1 93 <NA> Ficus_trigonata 2018 44090
5 LUQ 1 1 109 <NA> Ficus_trigonata 2015 39670
6 LUQ 1 1 92 <NA> Ficus_trigonata 2015 35075
7 LUQ 1 1 93 <NA> Ficus_trigonata 2016 33500
8 LUQ 1 1 92 <NA> Ficus_trigonata 2008 33000
9 LUQ 1 1 107 <NA> Ficus_trigonata 2011 32689
10 LUQ 1 1 99 <NA> Cecropia_schreberiana 2009 30594
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
Por orden jerárquico:
# A tibble: 213,062 × 14
site_name megaplot plot trap plant_ID species_name year count
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl>
1 AEC adirondack adirondack 971 <NA> Acer_rubrum 2009 191
2 AEC adirondack adirondack 971 <NA> Acer_rubrum 2004 171
3 AEC adirondack adirondack 971 <NA> Acer_rubrum 1995 141
4 AEC adirondack adirondack 941 <NA> Acer_rubrum 1994 105
5 AEC adirondack adirondack 941 <NA> Acer_rubrum 1995 85
6 AEC adirondack adirondack 972 <NA> Acer_rubrum 2007 82
7 AEC adirondack adirondack 971 <NA> Acer_rubrum 2008 81
8 AEC adirondack adirondack 971 <NA> Acer_rubrum 1993 79
9 AEC adirondack adirondack 941 <NA> Acer_rubrum 1991 77
10 AEC adirondack adirondack 938 <NA> Acer_rubrum 2004 72
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
rename: renamed one variable (site)
# A tibble: 213,062 × 14
site megaplot plot trap plant_ID species_name year count stem_diameter_cm
<chr> <chr> <chr> <dbl> <chr> <chr> <dbl> <dbl> <dbl>
1 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1962 22 56.6
2 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1963 0 NA
3 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1964 0 NA
4 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1965 2 NA
5 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1966 0 NA
6 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1967 2 NA
7 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1968 108 NA
8 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1969 0 NA
9 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1970 0 NA
10 AND Bare Mo… CNCT… NA CNCT_01… Abies_amabi… 1971 7 NA
# ℹ 213,052 more rows
# ℹ 5 more variables: trap_area_m2 <dbl>, height_diameter_taken <chr>,
# burned <dbl>, general_method <chr>, methods_notes <chr>
relocate: columns reordered (site_name, year, megaplot, plot, trap, …)
# A tibble: 213,062 × 14
site_name year megaplot plot trap plant_ID species_name count
<chr> <dbl> <chr> <chr> <dbl> <chr> <chr> <dbl>
1 AND 1962 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 22
2 AND 1963 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
3 AND 1964 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
4 AND 1965 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 2
5 AND 1966 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
6 AND 1967 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 2
7 AND 1968 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 108
8 AND 1969 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
9 AND 1970 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 0
10 AND 1971 Bare Mountain CNCT_01 NA CNCT_01ABAM1 Abies_amabilis 7
# ℹ 213,052 more rows
# ℹ 6 more variables: stem_diameter_cm <dbl>, trap_area_m2 <dbl>,
# height_diameter_taken <chr>, burned <dbl>, general_method <chr>,
# methods_notes <chr>
select: dropped 10 variables (megaplot, plot, trap, plant_ID, stem_diameter_cm, …)
# A tibble: 213,062 × 4
site_name year species_name count
<chr> <dbl> <chr> <dbl>
1 AND 1962 Abies_amabilis 22
2 AND 1963 Abies_amabilis 0
3 AND 1964 Abies_amabilis 0
4 AND 1965 Abies_amabilis 2
5 AND 1966 Abies_amabilis 0
6 AND 1967 Abies_amabilis 2
7 AND 1968 Abies_amabilis 108
8 AND 1969 Abies_amabilis 0
9 AND 1970 Abies_amabilis 0
10 AND 1971 Abies_amabilis 7
# ℹ 213,052 more rows
Quitar variables:
select: dropped 3 variables (megaplot, plot, trap)
# A tibble: 213,062 × 11
site_name plant_ID species_name year count stem_diameter_cm trap_area_m2
<chr> <chr> <chr> <dbl> <dbl> <dbl> <dbl>
1 AND CNCT_01ABAM1 Abies_amabi… 1962 22 56.6 NA
2 AND CNCT_01ABAM1 Abies_amabi… 1963 0 NA NA
3 AND CNCT_01ABAM1 Abies_amabi… 1964 0 NA NA
4 AND CNCT_01ABAM1 Abies_amabi… 1965 2 NA NA
5 AND CNCT_01ABAM1 Abies_amabi… 1966 0 NA NA
6 AND CNCT_01ABAM1 Abies_amabi… 1967 2 NA NA
7 AND CNCT_01ABAM1 Abies_amabi… 1968 108 NA NA
8 AND CNCT_01ABAM1 Abies_amabi… 1969 0 NA NA
9 AND CNCT_01ABAM1 Abies_amabi… 1970 0 NA NA
10 AND CNCT_01ABAM1 Abies_amabi… 1971 7 NA NA
# ℹ 213,052 more rows
# ℹ 4 more variables: height_diameter_taken <chr>, burned <dbl>,
# general_method <chr>, methods_notes <chr>
La función select() nos permite seleccionar, renombrar y recolocar - todo a la vez!
La función select() nos permite seleccionar, renombrar y recolocar - todo a la vez!
Rows: 213,062
Columns: 8
$ site <chr> "AND", "AND", "AND", "AND", "AND", "AND", "AND", "AND", "…
$ year <dbl> 1962, 1963, 1964, 1965, 1966, 1967, 1968, 1969, 1970, 197…
$ species_name <chr> "Abies_amabilis", "Abies_amabilis", "Abies_amabilis", "Ab…
$ plant_ID <chr> "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01A…
$ count <dbl> 22, 0, 0, 2, 0, 2, 108, 0, 0, 7, 0, 0, 2, 0, 12, 0, 21, 1…
$ method <chr> "PARTIALCONECOUNT", "PARTIALCONECOUNT", "PARTIALCONECOUNT…
$ stem_cm <dbl> 56.6, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ trap_area_m2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
Data Frame Summary
dt
Dimensions: 213062 x 1
Duplicates: 212921
-----------------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- -------------- ------------------------- -------------------- ------------- ---------- ---------
1 species_name 1. Abies_amabilis 15488 ( 7.3%) I 213062 0
[character] 2. Juniperus_monosperma 10201 ( 4.8%) (100.0%) (0.0%)
3. Amelanchier_arborea 10182 ( 4.8%)
4. Abies_procera 10141 ( 4.8%)
5. Tsuga_mertensiana 9684 ( 4.5%)
6. Acer_saccharum 8403 ( 3.9%)
7. Fraxinus_americana 7997 ( 3.8%)
8. Acer_rubrum 6231 ( 2.9%)
9. Tsuga_canadensis 6078 ( 2.9%)
10. Sassafras_albidum 5200 ( 2.4%)
[ 131 others ] 123457 (57.9%) IIIIIIIIIII
-----------------------------------------------------------------------------------------------------
Data Frame Summary
dt
Dimensions: 213062 x 1
Duplicates: 211712
-----------------------------------------------------------------------------------------------
No Variable Stats / Values Freqs (% of Valid) Graph Valid Missing
---- ----------- --------------------------- ---------------------- ------- --------- ---------
1 count Mean (sd) : 34.6 (2481.6) 1349 distinct values : 209047 4015
[numeric] min < med < max: : (98.1%) (1.9%)
0 < 0 < 1114340 :
IQR (CV) : 4 (71.7) :
:
-----------------------------------------------------------------------------------------------
distinct() - Extraer valores únicosmutate() - Crear nuevas variablesfilter() - Filtrar datos por casosgroup_by() - Agrupar datos por casossummarise() - Resumir datos por casoscase_when() - Categorizar datosNiveles de una variable:
Equivalente en library(base):
Niveles de una variable:
Ej: transformar frutos a frutos/m2
mutate: new variable 'fruits_per_m2' (double) with 2,116 unique values and 35% NA
# A tibble: 213,062 × 9
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1963 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
3 AND 1964 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
4 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
5 AND 1966 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
6 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
7 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
8 AND 1969 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
9 AND 1970 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
10 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCO… NA NA
# ℹ 213,052 more rows
# ℹ 1 more variable: fruits_per_m2 <dbl>
filter: removed 208,641 rows (98%), 4,421 rows remaining
# A tibble: 4,421 × 8
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
2 BNZ 1957 Picea_glauca <NA> 1 TRAP NA 0.25
3 BNZ 1957 Picea_glauca <NA> 2 TRAP NA 0.25
4 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
5 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
6 BNZ 1957 Picea_glauca <NA> 0 TRAP NA 0.25
7 BNZ 1957 Picea_glauca <NA> 2 TRAP NA 0.25
8 BNZ 1957 Picea_glauca <NA> 3 TRAP NA 0.25
9 BNZ 1957 Picea_glauca <NA> 0 TRAP NA 0.25
10 BNZ 1957 Picea_glauca <NA> 1 TRAP NA 0.25
# ℹ 4,411 more rows
filter: removed 150,475 rows (71%), 62,587 rows remaining
filter: removed 41,369 rows (66%), 21,218 rows remaining
# A tibble: 21,218 × 8
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
3 AND 1976 Abies_amabilis CNCT_01ABAM1 12 PARTIALCO… NA NA
4 AND 1978 Abies_amabilis CNCT_01ABAM1 21 PARTIALCO… NA NA
5 AND 1980 Abies_amabilis CNCT_01ABAM1 30 PARTIALCO… NA NA
6 AND 1982 Abies_amabilis CNCT_01ABAM1 61 PARTIALCO… NA NA
7 AND 1985 Abies_amabilis CNCT_01ABAM1 76 PARTIALCO… NA NA
8 AND 1991 Abies_amabilis CNCT_01ABAM1 42 PARTIALCO… NA NA
9 AND 1995 Abies_amabilis CNCT_01ABAM1 75 PARTIALCO… NA NA
10 AND 1997 Abies_amabilis CNCT_01ABAM1 52 PARTIALCO… NA NA
# ℹ 21,208 more rows
group_by: one grouping variable (site)
summarise: now 9 rows and 2 columns, ungrouped
# A tibble: 9 × 2
site fruits
<chr> <dbl>
1 AEC 55731.
2 AND 1968048
3 BNZ 915902
4 CDR 85431
5 CWT 292939
6 HBR 24556.
7 HFR 3683
8 LUQ 3653588
9 SEV 231905.
dt |>
group_by(site) |>
summarise(max_fruit = max(count, na.rm = TRUE),
min_fruit = min(count, na.rm = TRUE)) group_by: one grouping variable (site)
summarise: now 9 rows and 3 columns, ungrouped
# A tibble: 9 × 3
site max_fruit min_fruit
<chr> <dbl> <dbl>
1 AEC 591 0
2 AND 5000 0
3 BNZ 7230 0
4 CDR 151 0
5 CWT 1383 0
6 HBR 244 0
7 HFR 77 0
8 LUQ 1114340 0
9 SEV 1100 0
Crear dataset con media de frutos de cada especie de árbol por sitio y por año:
dt |>
group_by(site, species_name, year) |>
summarise(mean_fruits = mean(count, na.rm = TRUE)) |>
ungroup()group_by: 3 grouping variables (site, species_name, year)
summarise: now 3,212 rows and 4 columns, 2 group variables remaining (site, species_name)
ungroup: no grouping variables
# A tibble: 3,212 × 4
site species_name year mean_fruits
<chr> <chr> <dbl> <dbl>
1 AEC Acer_rubrum 1988 0
2 AEC Acer_rubrum 1989 3.1
3 AEC Acer_rubrum 1990 0.44
4 AEC Acer_rubrum 1991 9.36
5 AEC Acer_rubrum 1992 3.90
6 AEC Acer_rubrum 1993 4.45
7 AEC Acer_rubrum 1994 9.75
8 AEC Acer_rubrum 1995 6.52
9 AEC Acer_rubrum 1996 6.86
10 AEC Acer_rubrum 1997 1.80
# ℹ 3,202 more rows
Crear una nueva variable en base a diferentes niveles de frutos.
Ej - un factor de 3 niveles de cantidad frutos:
filter: removed 4,015 rows (2%), 209,047 rows remaining
filter: removed 126,766 rows (61%), 82,281 rows remaining
select: dropped 7 variables (site, year, species_name, plant_ID, method, …)
count
Min. : 0.1
1st Qu.: 2.0
Median : 8.0
Mean : 87.9
3rd Qu.: 35.0
Max. :1114340.0
Crear una nueva variable en base a diferentes niveles de frutos.
Ej - un factor de 3 niveles de cantidad frutos:
dt |>
mutate(nivel_frutos = case_when(
count <= 100 ~ "bajo",
count > 100 & count <= 1000 ~ "medio",
count > 1000 ~ "alto"))mutate: new variable 'nivel_frutos' (character) with 4 unique values and 2% NA
# A tibble: 213,062 × 9
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCO… 56.6 NA
2 AND 1963 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
3 AND 1964 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
4 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
5 AND 1966 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
6 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCO… NA NA
7 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCO… NA NA
8 AND 1969 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
9 AND 1970 Abies_amabilis CNCT_01ABAM1 0 PARTIALCO… NA NA
10 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCO… NA NA
# ℹ 213,052 more rows
# ℹ 1 more variable: nivel_frutos <chr>
Contar numero de arboles con distintos niveles de frutos:
dt |>
mutate(nivel_frutos = case_when(
count <= 100 ~ "bajo",
count > 100 & count <= 1000 ~ "medio",
count > 1000 ~ "alto")) |>
group_by(nivel_frutos) |>
summarise(trees = n())mutate: new variable 'nivel_frutos' (character) with 4 unique values and 2% NA
group_by: one grouping variable (nivel_frutos)
summarise: now 4 rows and 2 columns, ungrouped
# A tibble: 4 × 2
nivel_frutos trees
<chr> <int>
1 alto 698
2 bajo 200157
3 medio 8192
4 <NA> 4015
arrange() - Ordenar variable por casosrename() - Renombrar variablesrelocate() - Reordenar variablesselect() - Extraer variablesdistinct() - Extraer valores únicosmutate() - Crear nuevas variablesfilter() - Filtrar datos por casosgroup_by() - Agrupar datos por casossummarise() - Resumir datos por casoscase_when() - Filtrar datos por casos
Función if_else():
Función if_else():
dt_fix <- dt |>
# quitar un valor equivocado
mutate(count = if_else(count > 200000, NA, count)) |>
# calcular número de frutos por m2
mutate(fruits_per_m2 = count/trap_area_m2) |>
# crear variable con la cantidad de frutos de count o corregida
mutate(fruits = if_else(is.na(fruits_per_m2), count, fruits_per_m2))mutate: changed one value (<1%) of 'count' (1 new NA)
mutate: new variable 'fruits_per_m2' (double) with 2,115 unique values and 35% NA
mutate: new variable 'fruits' (double) with 2,305 unique values and 2% NA
Función if_else():
dt_fix <- dt |>
# quitar un valor equivocado
mutate(count = if_else(count > 200000, NA, count)) |>
# calcular número de frutos por m2
mutate(fruits_per_m2 = count/trap_area_m2) |>
# crear variable con la cantidad de frutos de count o corregida
mutate(fruits = if_else(is.na(fruits_per_m2), count, fruits_per_m2)) |>
# quitar valores de 0 o NA
filter(count != 0)mutate: changed one value (<1%) of 'count' (1 new NA)
mutate: new variable 'fruits_per_m2' (double) with 2,115 unique values and 35% NA
mutate: new variable 'fruits' (double) with 2,305 unique values and 2% NA
filter: removed 130,782 rows (61%), 82,280 rows remaining
library(tidyr)pivot_wider()pivot_longer()# A tibble: 6 × 10
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1962 Abies_amabilis CNCT_01ABAM1 22 PARTIALCON… 56.6 NA
2 AND 1965 Abies_amabilis CNCT_01ABAM1 2 PARTIALCON… NA NA
3 AND 1967 Abies_amabilis CNCT_01ABAM1 2 PARTIALCON… NA NA
4 AND 1968 Abies_amabilis CNCT_01ABAM1 108 PARTIALCON… NA NA
5 AND 1971 Abies_amabilis CNCT_01ABAM1 7 PARTIALCON… NA NA
6 AND 1974 Abies_amabilis CNCT_01ABAM1 2 PARTIALCON… NA NA
# ℹ 2 more variables: fruits_per_m2 <dbl>, fruits <dbl>
Primero creamos dataset reducido:
group_by: 2 grouping variables (site, year)
summarise: now 280 rows and 3 columns, one group variable remaining (site)
# A tibble: 280 × 3
# Groups: site [9]
site year fruits
<chr> <dbl> <dbl>
1 AEC 1988 252.
2 AEC 1989 656.
3 AEC 1990 67.3
4 AEC 1991 148.
5 AEC 1992 279.
6 AEC 1993 66.1
7 AEC 1994 375.
8 AEC 1995 343.
9 AEC 1996 250.
10 AEC 1997 233.
# ℹ 270 more rows
Convertir a formato corto:
dt_short <- dt_fix |>
group_by(site, year) |>
summarise(fruits = mean(fruits, na.rm. = TRUE)) |>
pivot_wider(names_from = "site",
values_from = "fruits")group_by: 2 grouping variables (site, year)
summarise: now 280 rows and 3 columns, one group variable remaining (site)
pivot_wider: reorganized (site, fruits) into (AEC, AND, BNZ, CDR, CWT, …) [was 280x3, now 65x10]
# A tibble: 6 × 10
year AEC AND BNZ CDR CWT HBR HFR LUQ SEV
<dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1988 252. 27.0 198. NA NA NA NA NA NA
2 1989 656. 46.6 1406. NA NA NA NA NA NA
3 1990 67.3 26.2 909. NA NA NA NA NA NA
4 1991 148. 140. 1318. NA 154. NA NA NA NA
5 1992 279. 28.4 357. NA 101. NA NA 249. NA
6 1993 66.1 65.1 1683. NA 124. 43.9 NA 269. NA
Convertir a formato largo:
pivot_longer: reorganized (AEC, AND, BNZ, CDR, CWT, …) into (site, fruits) [was 65x10, now 585x3]
# A tibble: 585 × 3
year site fruits
<dbl> <chr> <dbl>
1 1988 AEC 252.
2 1988 AND 27.0
3 1988 BNZ 198.
4 1988 CDR NA
5 1988 CWT NA
6 1988 HBR NA
7 1988 HFR NA
8 1988 LUQ NA
9 1988 SEV NA
10 1989 AEC 656.
# ℹ 575 more rows
join
Leemos un nuevo dataset con información de atributos para las especies de árboles:
Rows: 104 Columns: 17
── Column specification ────────────────────────────────────────────────────────
Delimiter: ","
chr (15): species_name, family, genus, epithet, pollinator_code, mycorrhiza_...
dbl (2): seed_development_years, seed_mass_mg
ℹ Use `spec()` to retrieve the full column specification for this data.
ℹ Specify the column types or set `show_col_types = FALSE` to quiet this message.
Rows: 104
Columns: 17
$ species_name <chr> "Abies_amabilis", "Abies_concolor", "Abies_gran…
$ family <chr> "Pinaceae", "Pinaceae", "Pinaceae", "Pinaceae",…
$ genus <chr> "Abies", "Abies", "Abies", "Abies", "Abies", "A…
$ epithet <chr> "amabilis", "concolor", "grandis", "lasiocarpa"…
$ seed_development_years <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ pollinator_code <chr> "wind", "wind", "wind", "wind", "wind", "wind",…
$ mycorrhiza_type <chr> "EM", "EM", "EM", "EM", "EM", "EM", "AM", "AM",…
$ needleleaf_broadleaf <chr> "needleleaf", "needleleaf", "needleleaf", "need…
$ deciduous_evergreen <chr> "evergreen", "evergreen", "evergreen", "evergre…
$ seed_maturation_timing <chr> "late summer", "fall", "late summer", "late sum…
$ seed_mass_mg <dbl> 46.2063354, 34.2847056, 21.0800075, 13.7327226,…
$ sexual_system <chr> "monoecious", "monoecious", "monoecious", "mono…
$ shade_tolerance <chr> "tolerant", "tolerant", "tolerant", "tolerant",…
$ growth_form <chr> "tree", "tree", "tree", "tree", "tree", "tree",…
$ seed_bank <chr> "no", "no", "no", "no", "yes", "yes", "no", "no…
$ fleshy_fruit <chr> "no", "no", "no", "no", "no", "no", "no", "no",…
$ dispersal_syndrome <chr> "abiotic", "abiotic", "abiotic", "abiotic", "ab…
La función count cuenta el número de casos para una variable categórica
# A tibble: 2 × 2
pollinator_code n
<chr> <int>
1 animal 73
2 wind 31
# A tibble: 41 × 2
family n
<chr> <int>
1 Aceraceae 2
2 Annonaceae 2
3 Aquifoliaceae 1
4 Araliaceae 2
5 Arecaceae 1
6 Betulaceae 4
7 Bignoniaceae 2
8 Boraginaceae 2
9 Burseraceae 2
10 Cecropiaceae 1
# ℹ 31 more rows
Usando left_join()
Rows: 82,280
Columns: 26
$ site <chr> "AND", "AND", "AND", "AND", "AND", "AND", "AND"…
$ year <dbl> 1962, 1965, 1967, 1968, 1971, 1974, 1976, 1978,…
$ species_name <chr> "Abies_amabilis", "Abies_amabilis", "Abies_amab…
$ plant_ID <chr> "CNCT_01ABAM1", "CNCT_01ABAM1", "CNCT_01ABAM1",…
$ count <dbl> 22, 2, 2, 108, 7, 2, 12, 21, 1, 30, 61, 76, 5, …
$ method <chr> "PARTIALCONECOUNT", "PARTIALCONECOUNT", "PARTIA…
$ stem_cm <dbl> 56.6, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, N…
$ trap_area_m2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fruits_per_m2 <dbl> NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA,…
$ fruits <dbl> 22, 2, 2, 108, 7, 2, 12, 21, 1, 30, 61, 76, 5, …
$ family <chr> "Pinaceae", "Pinaceae", "Pinaceae", "Pinaceae",…
$ genus <chr> "Abies", "Abies", "Abies", "Abies", "Abies", "A…
$ epithet <chr> "amabilis", "amabilis", "amabilis", "amabilis",…
$ seed_development_years <dbl> 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2,…
$ pollinator_code <chr> "wind", "wind", "wind", "wind", "wind", "wind",…
$ mycorrhiza_type <chr> "EM", "EM", "EM", "EM", "EM", "EM", "EM", "EM",…
$ needleleaf_broadleaf <chr> "needleleaf", "needleleaf", "needleleaf", "need…
$ deciduous_evergreen <chr> "evergreen", "evergreen", "evergreen", "evergre…
$ seed_maturation_timing <chr> "late summer", "late summer", "late summer", "l…
$ seed_mass_mg <dbl> 46.20634, 46.20634, 46.20634, 46.20634, 46.2063…
$ sexual_system <chr> "monoecious", "monoecious", "monoecious", "mono…
$ shade_tolerance <chr> "tolerant", "tolerant", "tolerant", "tolerant",…
$ growth_form <chr> "tree", "tree", "tree", "tree", "tree", "tree",…
$ seed_bank <chr> "no", "no", "no", "no", "no", "no", "no", "no",…
$ fleshy_fruit <chr> "no", "no", "no", "no", "no", "no", "no", "no",…
$ dispersal_syndrome <chr> "abiotic", "abiotic", "abiotic", "abiotic", "ab…
write_csv - usa separador de “,”write_csv2 - usa separador de “;”write_delim - usa cualquier separador de datos (ej. delim = “|”)readr, readxl, and googlesheets4dplyrtidyrstringrforcatslubridateUsando la base de datos final (dt_sp), seleccionar datos con información para diámetro de tronco (stem_cm) y ordernar de mayor a menor:
Usando la base de datos final (dt_sp), seleccionar datos con información para diámetro de tronco (stem_cm) y ordernar de mayor a menor:
filter: removed 80,777 rows (98%), 1,503 rows remaining
# A tibble: 1,503 × 26
site year species_name plant_ID count method stem_cm trap_area_m2
<chr> <dbl> <chr> <chr> <dbl> <chr> <dbl> <dbl>
1 AND 1993 Abies_procera CNCT_37ABPR17 250 PARTIALCO… 221. NA
2 AND 1962 Abies_procera CNCT_15ABPR15 50 PARTIALCO… 198. NA
3 AND 1962 Abies_procera CNCT_15ABPR8 68 PARTIALCO… 196. NA
4 AND 1993 Abies_procera CNCT_37ABPR1 350 PARTIALCO… 186. NA
5 AND 1961 Abies_procera CNCT_37ABPR3 231 PARTIALCO… 186. NA
6 AND 1993 Abies_procera CNCT_37ABPR3 410 PARTIALCO… 184. NA
7 AND 1962 Abies_procera CNCT_15ABPR16 130 PARTIALCO… 183. NA
8 AND 1992 Abies_procera CNCT_02ABPR35 54 PARTIALCO… 180. NA
9 AND 1962 Abies_procera CNCT_15ABPR3 300 PARTIALCO… 178. NA
10 AND 1993 Abies_procera CNCT_37ABPR10 250 PARTIALCO… 174. NA
# ℹ 1,493 more rows
# ℹ 18 more variables: fruits_per_m2 <dbl>, fruits <dbl>, family <chr>,
# genus <chr>, epithet <chr>, seed_development_years <dbl>,
# pollinator_code <chr>, mycorrhiza_type <chr>, needleleaf_broadleaf <chr>,
# deciduous_evergreen <chr>, seed_maturation_timing <chr>,
# seed_mass_mg <dbl>, sexual_system <chr>, shade_tolerance <chr>,
# growth_form <chr>, seed_bank <chr>, fleshy_fruit <chr>, …
Usando la base de datos final (dt_sp), calcular diámetro medio y SD para cada especie de árbol.
Usando la base de datos final (dt_sp), calcular diámetro medio y SD para cada especie de árbol.
dt_sp |>
filter(!is.na(stem_cm)) |>
group_by(species_name) |>
summarise(mean = mean(stem_cm),
sd = sd(stem_cm))filter: removed 80,777 rows (98%), 1,503 rows remaining
group_by: one grouping variable (species_name)
summarise: now 10 rows and 3 columns, ungrouped
# A tibble: 10 × 3
species_name mean sd
<chr> <dbl> <dbl>
1 Abies_amabilis 65.0 19.7
2 Abies_concolor 63.1 18.6
3 Abies_grandis 74.5 14.8
4 Abies_lasiocarpa 45.0 16.2
5 Abies_magnifica 87.9 19.9
6 Abies_procera 104. 34.4
7 Picea_engelmannii 80.2 16.5
8 Pinus_lambertiana 114. 27.7
9 Pinus_monticola 63.4 22.4
10 Tsuga_mertensiana 56.5 12.5
Usando la base de datos final (dt_sp), calcular el número de árboles y número de especies mayores de 40cm de diámetro y menores de 40cm de diámetro.
Usando la base de datos final (dt_sp), calcular el número de árboles y número de especies mayores de 40cm de diámetro y menores de 40cm de diámetro.
dt |>
filter(!is.na(stem_cm)) |>
mutate(tree_size = case_when(stem_cm >= 40 ~ "big",
stem_cm < 40 ~ "small")) |>
group_by(tree_size) |>
summarise(n_trees = n(),
n_species = n_distinct(species_name))filter: removed 210,780 rows (99%), 2,282 rows remaining
mutate: new variable 'tree_size' (character) with 2 unique values and 0% NA
group_by: one grouping variable (tree_size)
summarise: now 2 rows and 3 columns, ungrouped
# A tibble: 2 × 3
tree_size n_trees n_species
<chr> <int> <int>
1 big 2144 10
2 small 138 6
Usando la base de datos final (dt_sp), seleccionar sitios con método de conteo tipo “TRAP” y calcular cantidad máxima y mínima de frutos por m2 para cada sitio.
Usando la base de datos final (dt_sp), seleccionar sitios con método de conteo tipo “TRAP” y calcular cantidad máxima y mínima de frutos por m2 para cada sitio.
dt_sp |>
filter(method == "TRAP") |>
group_by(site) |>
summarise(max_fruit = max(fruits_per_m2),
min_fruit = mean(fruits_per_m2))filter: removed 33,191 rows (40%), 49,089 rows remaining
group_by: one grouping variable (site)
summarise: now 5 rows and 3 columns, ungrouped
# A tibble: 5 × 3
site max_fruit min_fruit
<chr> <dbl> <dbl>
1 AEC 8107. 296.
2 BNZ 28920 1088.
3 CWT 12207. 197.
4 HBR 2440 93.4
5 LUQ 213300 299.
Usando la base de datos final (dt_sp), crear una tabla que compare la suma de frutos contados en los sitios CWT y HFR (en columnas), para los años entre 2000-2010 (filas).
Usando la base de datos final (dt_sp), crear una tabla que compare la suma de frutos contados en los sitios CWT y HFR (en columnas), para los años entre 2000-2010 (filas).
dt_sp |>
filter(site %in% c("CWT", "SEV")) |>
filter(year %in% c(2000:2010)) |>
group_by(site, year) |>
summarise(fruits = sum(fruits)) |>
pivot_wider(names_from = site, values_from = fruits) # A tibble: 11 × 3
year CWT SEV
<dbl> <dbl> <dbl>
1 2000 92776. 5561.
2 2001 133160. 28243.
3 2002 45746. 302.
4 2003 63213. 13646.
5 2004 67092. 23964.
6 2005 47034. 20558.
7 2006 84603. 726.
8 2007 114387. 11630.
9 2008 147617. 14634
10 2009 138707. 6
11 2010 139444. 15441.
Usando la base de datos final (dt_sp), crear una tabla que compare la suma de frutos contados entre los años 2001 y 2005 (en columnas), para las especies de Abies (filas).
Usando la base de datos final (dt_sp), crear una tabla que compare la suma de frutos contados entre los años 2001 y 2005 (en columnas), para las especies de Abies (filas).
dt_sp |>
filter(year %in% c(2001:2005)) |>
filter(str_detect(species_name, "Abies")) |>
group_by(year, species_name) |>
summarise(fruits = sum(fruits)) |>
pivot_wider(names_from = year, values_from = fruits)# A tibble: 6 × 6
species_name `2001` `2002` `2003` `2004` `2005`
<chr> <dbl> <dbl> <dbl> <dbl> <dbl>
1 Abies_amabilis 721 2819 5907 54 864
2 Abies_concolor 1429 92 3032 NA 136
3 Abies_grandis 3509 238 4414 17 1119
4 Abies_magnifica 52 1374 6324 8 570
5 Abies_procera 3308 7772 10485 957 1588
6 Abies_lasiocarpa NA 443 1974 17 73